15 research outputs found

    Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools

    Get PDF
    Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.Peer ReviewedPostprint (published version

    Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding

    Get PDF
    In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a subset of features per sample, while assuming sparse representations of data vectors on an unknown dictionary. Sufficient conditions are identified, such that, if it is possible to train a classifier on incomplete observations so that their reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results on synthetic and well-known datasets are presented that validate our theoretical findings and demonstrate the effectiveness of the proposed method compared to traditional data imputation approaches and one state-of-the-art algorithm.Fil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Wang, Ziyao. South East University; ChinaFil: Sole Casals, Jordi. University of Vic; EspañaFil: Zhao, Qibin. Center for Advanced Intelligence Project; JapónIEEE Computer Society Conference on Computer Vision and Pattern Recognition 2021New YorkEstados UnidosIEE

    Gene filtering with optimal threshold selection

    Get PDF
    Gene filtering is a useful preprocessing technique often applied to microarray datasets. However, it is no common practice because clear guidelines are lacking and it bears the risk of excluding some potentially relevant genes. In this work, we propose to model microarray data as a mixture of two Gaussian distributions that will allow us to obtain an optimal filter threshold in terms of the gene expression level.Fil: Bau Macia, Josep. Universidad de Vic; EspañaFil: Sole Casals, Jordi. Universidad de Vic; EspañaFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; ArgentinaThe Barcelona International Conference on Advances in StatisticsBarcelonaEspañaUniversidad Autónoma de Barcelon

    Decomposition methods for machine learning with small, incomplete or noisy datasets

    Get PDF
    In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.Fil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Sole Casals, Jordi. Center for Advanced Intelligence; JapónFil: Marti Puig, Pere. University of Catalonia; EspañaFil: Sun, Zhe. RIKEN; JapónFil: Tanaka,Toshihisa. Tokyo University of Agriculture and Technology; Japó

    Detection of Wind Turbine Failures through Cross-Information between Neighbouring Turbines

    Get PDF
    In this paper, the time variation of signals from several SCADA systems of geographically closed turbines are analysed and compared. When operating correctly, they show a clear pattern of joint variation. However, the presence of a failure in one of the turbines causes the signals from the faulty turbine to decouple from the pattern. From this information, SCADA data is used to determine, firstly, how to derive reference signals describing this pattern and, secondly, to compare the evolution of different turbines with respect to this joint variation. This makes it possible to determine whether the behaviour of the assembly is correct, because they maintain the well-functioning patterns, or whether they are decoupled. The presented strategy is very effective and can provide important support for decision making in turbine maintenance and, in the near future, to improve the classification of signals for training supervised normality models. In addition to being a very effective system, it is a low computational cost strategy, which can add great value to the SCADA data systems present in wind farms.Peer ReviewedObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No Contaminant::7.a - Per a 2030, augmentar la cooperació internacional per tal de facilitar l’accés a la investigació i a les tecnolo­gies energètiques no contaminants, incloses les fonts d’energia renovables, l’eficiència energètica i les tecnologies de combustibles fòssils avançades i menys contaminants, i promoure la inversió en infraestructures energètiques i tecnologies d’energia no contaminantObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No Contaminant::7.b - Per a 2030, ampliar la infraestructura i millorar la tecnologia per tal d’oferir serveis d’energia moderns i sos­tenibles per a tots els països en desenvolupament, en particular els països menys avançats, els petits estats insulars en desenvolupament i els països en desenvolupament sense litoral, d’acord amb els programes de suport respectiusObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No ContaminantPostprint (published version

    Serial-EMD: Fast Empirical Mode Decomposition Method for Multi-dimensional Signals Based on Serialization

    Get PDF
    Empirical mode decomposition (EMD) has developed into a prominent tool for adaptive, scale-based signal analysis in various fields like robotics, security and biomedical engineering. Since the dramatic increase in amount of data puts forward higher requirements for the capability of real-time signal analysis, it is difficult for existing EMD and its variants to trade off the growth of data dimension and the speed of signal analysis. In order to decompose multi-dimensional signals at a faster speed, we present a novel signal-serialization method (serial-EMD), which concatenates multi-variate or multi-dimensional signals into a one-dimensional signal and uses various one-dimensional EMD algorithms to decompose it. To verify the effects of the proposed method, synthetic multi-variate time series, artificial 2D images with various textures and real-world facial images are tested. Compared with existing multi-EMD algorithms, the decomposition time becomes significantly reduced. In addition, the results of facial recognition with Intrinsic Mode Functions (IMFs) extracted using our method can achieve a higher accuracy than those obtained by existing multi-EMD algorithms, which demonstrates the superior performance of our method in terms of the quality of IMFs. Furthermore, this method can provide a new perspective to optimize the existing EMD algorithms, that is, transforming the structure of the input signal rather than being constrained by developing envelope computation techniques or signal decomposition methods. In summary, the study suggests that the serial-EMD technique is a highly competitive and fast alternative for multi-dimensional signal analysis.Fil: Zhang, Jin. Nankai University; ChinaFil: Feng, Fan. Nankai University; ChinaFil: Marti Puig, Pere. Central University of Catalonia; EspañaFil: Caiafa, César Federico. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Argentino de Radioastronomía. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Argentino de Radioastronomía; ArgentinaFil: Sun, Zhe. RIKEN; JapónFil: Duan, Feng. Nankai University; ChinaFil: Sole Casals, Jordi. Central University of Catalonia; Españ

    Maximum likelihood Linear Programming Data Fusion for Speaker Recognition

    Get PDF
    Biometric system performance can be improved by means of data fusion. Several kinds of information can be fused in order to obtain a more accurate classification (identification or verification) of an input sample. In this paper we present a method for computing the weights in a weighted sum fusion for score combinations, by means of a likelihood model. The maximum likelihood estimation is set as a linear programming problem. The scores are derived from a GMM classifier working on a different feature extractor. Our experimental results assesed the robustness of the system in front a changes on time (different sessions) and robustness in front a change of microphone. The improvements obtained were significantly better (error bars of two standard deviations) than a uniform weighted sum or a uniform weighted product or the best single classifier. The proposed method scales computationaly with the number of scores to be fussioned as the simplex method for linear programming

    Initialisation of Nonlinearities for PNL and Wiener systems Inversion

    No full text
    Abstract. This paper proposes a very fast method for blindly initializing a nonlinear mapping which transforms a sum of random variables. The method provides a surprisingly good approximation even when the basic assumption is not fully satisfied. The method can been used successfully for initializing nonlinearity in post-nonlinear mixtures or in Wiener system inversion, for improving algorithm speed and convergence.

    Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools

    No full text
    Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.Peer Reviewe
    corecore